Toward the Soundness of Sense Structure Definitions in Thesaurus-Dictionaries. Parsing Problems and Solutions

نویسندگان

  • Neculai Curteanu
  • Mihai Alex Moruz
چکیده

In this paper we point out some difficult problems of thesaurusdictionary entry parsing, relying on the parsing technology of SCD (Segmentation-Cohesion-Dependency) configurations, successfully applied on six largest thesauri – Romanian (2), French, German (2), and Russian. Challenging Problems: (a) Intricate and / or recursive structures of the lexicographic segments met in the entries of certain thesauri; (b) Cyclicity (recursive) calls of some sense marker classes on marker sequences; (c) Establishing the hypergraph-driven dependencies between all the atomic and non-atomic sense definitions. Classical approach to solve these parsing problems is hard mainly because of depth-first search of sense definitions and markers, the substantial complexity of entries, and the sense tree dynamic construction embodied within these parsers. SCD-based Parsing Solutions: (a) The SCD parsing method is a procedural tool, completely formal grammar-free, handling the recursive structure of the lexicographic segments by procedural nonrecursive calls performed on the SCD parsing configurations of the entry structure. (b) For dealing with cyclicity (recursive) calls between secondary sense markers and the sense enumeration markers, we proposed the Enumeration Closing Condition, sometimes coupled with New_Paragraphs typographic markers ∗This paper is dedicated to Prof. Svetlana Cojocaru, IMI Director, as a tribute to her high professionalism, genuine friendship, passion and devotion to the special guild of researchers. The authors, with gratitude and best wishes for her sixtieth anniversary! c ©2012 by N. Curteanu, A. Moruz

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Optimal and Portable Parsing Method for Romanian, French, and German Large Dictionaries

This paper presents a cross-linguistic analysis of the largest dictionaries currently existing for Romanian, French, and German, and a new, robust and portable method for Dictionary Entry Parsing (DEP), based on SegmentationCohesion-Dependency (SCD) configurations. The SCD configurations are applied successively on each dictionary entry to identify its lexicographic segments (the first SCD conf...

متن کامل

Extracting Sense Trees from the Romanian Thesaurus by Sense Segmentation & Dependency Parsing

This paper aims to introduce a new parsing strategy for large dictionary (thesauri) parsing, called Dictionary Sense Segmentation & Dependency (DSSD), devoted to obtain the sense tree, i.e. the hierarchy of the defined meanings, for a dictionary entry. The real novelty of the proposed approach is that, contrary to dictionary ‘standard’ parsing, DSSD looks for and succeeds to separate the two es...

متن کامل

Combining a Chinese Thesaurus with a Chinese Dictionary

Abs t rac t In this paper, we study the problem of combining a Chinese thesaurus with a Chinese dictionary by linking the word entries in the thesaurus with the word senses in the dictionary, and propose a similar word strategy to solve the problem. The method is based on the definitions given in the dictionary, but without any syntactic parsing or sense disambiguation on them at all. As a resu...

متن کامل

Parsing vs. Text Processing in the Analysis of Dictionary Definitions

We have analyzed definitions from Webster's Seventh New Collegiate Dictionary using Sager's Linguistic String Parser and again using basic UNIX text processing utilities such as grep and awk. Tiffs paper evaluates both procedures, compares their results, and discusses possible future lines of research exploiting and combining their respective strengths. Introduction As natural language systems ...

متن کامل

From Machine Readable Dictionaries to Lexicons for NLP: the Cobuild Dictionaries - a Different Approach

We describe the results of a syntactic-semantic parser for Cobuild dictionary definitions. Unlike previous work on the automatic analysis of machine readable dictionaries, the particular structure of the Cobuild definition allows us to derive information that classifies the lexical item mainly in terms of the selectional restrictions or preferences encoded on its arguments. The resulting formal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • The Computer Science Journal of Moldova

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2012